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2.7 Stein's phenomenon and James-Stein estimators. Let \y\ := {Vi + ■ ■ ■ Va)^^'^ 
for y E M'^. Consider the normal location family N{ji,I), fi e M'^, on W^^ having density 
(27r)~'^/^ exp(— |a; — with respect to Lebesgue measure dx, where / is the d x d 

identity matrix. The problem is to estimate the unknown /i from an observation x. Here 
for simplicity n is taken equal to 1. If we had n i.i.d. observations Xi,... ,Xn, then 

X := (XiH Xn)/n is a minimal, Lehmann-Scheffe sufficient statistic having distribution 

Ar(//, I/n), and \/nX is such a statistic with distribution N{^Jnji^ /), so the situation would 
not be essentially different. 

The observation x is an unbiased estimator of in other words Exi = fXi for i = 
1, . . . ,d. The information inequality holds in this case and gives for each i, (Ti — Hi)'^ > 
1 for each unbiased estimator Tj of //j. Thus £'p(|T — //p) > d for each unbiased estimator 
T of 11. This lower bound is attained hy T = x. 

It turns out, however, that for d > 3, T — x is an inadmissible estimator of fi for 
squared-error loss. This fact is called "Stein's phenomenon," after Charles Stein, who 
discovered it. Let 

called a James-Stein estimator of fi. Then for d > 3, r{iJ,,J) < r{n,x) for all fi, as will 
be proved for d = 3. It's very surprising that although the coordinates 
independent for any the xj for j ^ i are useful in estimating /Xj. For d — 2, simply 
J{x) = x. For d = 1, J would be a bad estimator with infinite risk for squared-error loss 
because of a singularity at a; = 0. Note however that for d > 3, due to the factor |a;|'^~-^ 
in the volume element in spherical coordinates, = \x\~^ and |a;/|a;|^p = \x\~'^ are 

integrable for any normal law despite being unbounded near 0. 

The estimator J is not admissible either; [max(0, 1 — {d — 2)|a;|~^)]a; is a better esti- 
mator, but it is still not admissible (see the notes). Here, it will just be shown that Stein's 
phenomenon occurs for d = 3 with the James-Stein estimator J. 

2.7.1 Proposition. For d = 3 and the estimators J{x) := (1 — |a;|~^)a; and x for the 
mean /i in the normal location family {N{n,I) : /i G K^}, we have r{fj,,J) < r{fj,,x) for 
all fx. Thus X is an inadmissible estimator of fx. 

Proof. Clearly r{ii,x) = 3 for all /i. It will be enough to show that for all 
(2.7.2) /(//) r(^,J) = gifi) := 3-E^i\x\-^). 

It seems to be difficult to prove this directly, so it will be done by an indirect method as 
follows. First, /(/x) — /i(|Aip) for some function /i, since for any orthogonal transformation 
(3x3 orthogonal matrix) U from to R^, J{Ux) = UJ{x) and iV (//,/) o t/" 
N{Uii,I), so by the image measure theorem (RAP, 4.1.11), 

r(f///,J) = Eu^\J-Uii\^ = J \J{x)-Uii\^dN{Ufi,I){x) 
= J\J{x)-Ufifd[N{fi,I)oU-']{x) = J\J{Uy)-UfifdN{fi,I){y) 
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= J \U{J{y)-f,)\^dN{i,,I){y) = J \J{y)-i,\^dN{f,,I){y) = r{i,,J). 

For any /i and ji' in with = there is an orthogonal U with U^ = fi', so indeed 
/(/i) is a function, say /i, of It is also easily seen that / and /i are continuous, where 
integrals can be bounded using spherical coordinates as mentioned above. 

Next, g{iJ,) = for some function gi, by a similar but shorter sequence of 

equations, where g and gi are also continuous. Specifically, 

Eu^,{\x\-^) = {27r)-'^/^ J \x\-'^ exp{-\x - Ufx]'^ /2)dx 

= {27:)-"/' I \y\-^ eM-\y - f^f mdy = E^{\x\-^). 

For A > let Ea denote the expectation of functions with respect to the A^(0, AI) 
distribution. It will be shown that for all ^ > 0, 

(2.7.3) Ea/ = 3- = EAg. 

For /i with law N{0,AI) and, given fi, x having law N{p,I), the pair (x,^) have a 
jointly normal distribution on M.^\ where the three 2-vectors (x^, Hi) for z = 1, 2, 3 are i.i.d. 
Let E(^A) be expectation for this joint distribution. We have E(^A)Xi — and = 
jif + 1, so E(^A){xl) = ^ + 1, while by Lemma 2.1.1 

E(^A){xit^i) = EAE{xiiJ,i\iJ,i) = EA[iJ,iE{xi\iJ,i)] = Eai4 = A. 
Thus each {xi^ni) has the bivariate normal law 

iV(0 ' ^ + ^ ^ 



where is the two-dimensional vector. Now, we have 

(2.7.4) EAg = 3-EaE^\x\-^ = 3-E^a)\x\-^. 

If y has law N{0,I) on R^, then by spherical coordinates. 



nOO 

E\y\-'^ = (27r)-3/247r / exp(-rV2)cZr = 1. 

Jo 



For X with law iV(0,c/), x/c^/^ has law N{0,I), so 

E{\x\-^) = E 



X 
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Thus in (2.7.4), E(A)(|a;|-2) = l/(^ + 1), and (2.7.3) holds for 
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We have for each i, E(^j^^{iJ,i\xi) = bxi where, using Lemma 2.1.1 again, 

-4 = E^j^){^iXi) = E^A)[E(A){^J'iXi\xi)] = E(^A)ixibxi) = b{A + l) 

gives b = '■= A/{A + 1). Thus given Xi, Hi has law NipAXi^r]^ where Ea/j^^ = A ~ 
t\ + 6^(A + 1) imphes t\ = A/{A + 1). In other words given x, ji = bAX + C where ( is 
independent of x and has law N{0,AI/{A + 1)). Thus for conditional expectations given 
X we have 

j{ 

E(A){\J{x) - l^\'^\x} = E^A){\{i - \x\~^)x - j-^x - Cf\x} 
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A + 1 

so for the unconditional expectation, 

3A 2 
E^A){\Jix)-^i\') = -^ + {A+l)-^E^A)\x\'-^^ + E^A)\x\ 

3A 3 2 1 „ 1 

+ r - r + -. r = 3- 



A + 1 A + 1 A + 1 A + 1 A+1' 
proving (2.7.3) for /, and so finishing its proof. 

Now, for the family of laws A'"(0,^/), A > 0, for in M^, \x\^ is a Lehmann-Scheffe 
sufficient statistic from the exponential form of the density (Theorem 2.5.10). By the 
Lehmann-Scheffe property it follows that /i = gi almost everywhere and (2.7.2) follows, 
completing the proof. □ 

If we let /i have a prior distribution A^(0, AI), it follows from the above proof that bx 
is a Bayes estimator of n for b = A/ (A + 1), since for squared-error loss the Bayes estimator 
is the mean of the posterior distribution (by Proposition 2.6.1). So, up to equality almost 
surely, bx is the unique Bayes estimator. Thus for < 6 < 1, 6a; is admissible by Theorem 
1.2.5. Such an estimator, however, has a large bias and large risk when is large. Letting 
6 t 1 we see that the inadmissible estimator x is a limit of admissible estimators bx. It is 
much harder to give admissible estimators of fx which are better than x for d > 3 (see the 
Notes). _ 

The estimator x, or X for any n, for /j, is admissible for d = 1 (Lehmann, 1991, pp. 
265-267 gives two proofs), for squared-error loss and many other loss functions. Stein 
(1956) showed that X is admissible for d = 2. 

Recalling the notion of minimax decision rule, as defined in Sec. 1.2, the estimator x, 
or X for any n, is a minimax estimator of /x for any dimension d. To see this one can use 
again the fact that 6a; for < 6 < 1 is admissible. We have r(/i, 6a;) = db'^ -|- (1 — 6)^|/ip, 
which is minimized with respect to /i when = (or 6 = 1), with r(0, 6a;) = db^. Letting 
6 t 1 (A +c>o) we see that the minimax risk is d, which is the risk of x for all ^. The 
supremum over n of the risk for a James-Stein estimator, or any other estimator better 
than X, is also d. For the James-Stein estimator one can see that the risk r(/x, J) approaches 
d as |//| ^ oo. On the other hand the estimators 6a; for < 6 < 1 are not minimax, in fact 
sup^r{n,bx) = +00. For d > 3, there is a large class of minimax estimators (Baranchik, 
1970; Lehmann, 1991, Theorem 4.6.3). 
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PROBLEMS 



1. Show that for N{iJ,, I) on M*^, the estimator bx for fx is inadmissible if 6 < or 6 > 1. 

2. Show that for a fixed vector v ^ in M^, to estimate /j, in N{ij, /), an estimator bx + v 
is 

(a) never admissible for 6 = 1, 

(b) always admissible for < 6 < 1. Hint: bx + v = b{x — w) + w where w = v/{l — b). 
Consider a prior N{w, AI) for suitable A. 

3. For normal distributions N{p, I) on R'^, IX e W^, if IX has a prior distribution A^(0, /) and 
Xi, X2, and X3 are observed, assumed to be i.i.d. A^(/x, /), find the posterior distribution 
of jl. 

NOTES 

This section is based on the exposition in Lehmann (1991, Sees. 4.5 and 4.6). Stein 
(1956) discovered his phenomenon and James and Stein (1961) gave their estimator. Straw- 
derman (1971) gave a rather complicated estimator better than x, i.e. minimax estimator, 
which is admissible for d > 6, also mentioned by Lehmann (1991, p. 304). 
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